CAMO: Category-Agnostic 3D Motion Transfer from Monocular 2D Videos

Anonymous Author

Overview


Abstract

Motion transfer from 2D videos to 3D assets is a challenging problem, due to inherent pose ambiguities and diverse object shapes, often requiring category-specific parametric templates. We propose CAMO, a category-agnostic framework that transfers motion to diverse target meshes directly from monocular 2D videos without relying on predefined templates or explicit 3D supervision. The core of CAMO is a morphology-parameterized articulated 3D Gaussian splatting model combined with dense semantic correspondences to jointly adapt shape and pose through optimization. This approach effectively alleviates shape-pose ambiguities, enabling visually faithful motion transfer for diverse categories. Experimental results demonstrate superior motion accuracy, efficiency, and visual coherence compared to existing methods, significantly advancing motion transfer in varied object categories and casual video scenarios.

2D-to-3D Animal Motion 2D-to-3D Retargeting

2D-to-3D Humanoid Motion 2D-to-3D Retargeting

2D-to-3D In-the-wild Real-video Retargeting

Ablation study on shape parameterization

Video-to-Video Motion Transfer

Our approach enables seamless motion transfer between videos by reconstructing articulated 3D Gaussian splats from the source video and transferring them to the target video.

Note

This page is carefully anonymized for the purpose of double-blind review. We will release the code and data after the review process.